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Abstract 

One approach for solving interacting many-fermion systems is the configuration- 
interaction method, also sometimes called the interacting shell model, where 
one finds eigenvalues of the Hamiltonian in a many-body basis of Slater de- 
terminants (antisymmcterized products of single-particle wavefunctions) . The 
resulting Hamiltonian matrix is typically very sparse, but for large systems the 
nonzero matrix elements can nonetheless require terabytes or more of storage. 
An alternate algorithm, applicable to a broad class of systems with symmetry, 
in our case rotational invariance, is to exactly factorize both the basis and the 
interaction using additive/multiplicative quantum numbers; such an algorithm 
can reduce the storage requirements by an order of magnitude or more. We dis- 
cuss factorization in general as well as in the context of a specific configuration- 
interaction code, BIGSTICK, which runs both on serial and parallel machines. 

Keywords: shell model, configuration interaction, many-body 



1. Introduction 

The quantum mechanics of many-body systems, specifically when the num- 
ber of particles is between three and a few hundred, is a theoretical and compu- 
tational challenge. Often these systems have exact symmetries that can be either 
a curse or a gift. The quantum numbers associated with each symmetry should 
be treated exactly, hindering approximations. Conversely, some quantum num- 
bers can aid calculation by excluding trivial matrix elements via selection rules. 
The algorithmic exploitation of quantum numbers for efficient calculations in 
many-body systems is the theme of this paper. 

Central to the methods described here are abelian symmetries: in practi- 
cal terms this means the quantum numbers for many-body systems are simply 
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the sum or product of the quantum numbers of the constituent single-particle 
states. Examples include parity and, most importantly for us, the z-component 
of angular momentum. In this paper, we we focus on systems of fermions with 
rotationally invariant Hamiltonians: multi-electron atoms, atomic nuclei, and 
general fermions (e.g., cold atoms) in a spherically symmetric trap. Also key 
to our methods is the presence of two 'species' of fermions, e.g. protons and 
neutrons, spin-up and spin-down electrons, etc. While there is a long menu of 
methods and approximations to tackle such systems, we focus on configuration- 
interaction (CI) calculations, finding low-lying eigenvalues of the Hamiltonian 
matrix computed in a large-dimension basis [l|, 

uses a many-body basis of Slater determinants, antisymmeterized products of 
single-particle wavefunctions that can be rendered trivially orthonormal and, as 
we will see, which lend to rapid computation of Hamiltonian matrix elements. 
The advantages of CI are (a) it is fully microscopic; (b) allows for arbitrary 
single-particle basis; (c) allows for arbitrary form of the two-body interaction, 
i.e., no restriction on locality or momentum dependence, etc.; (d) is equally ef- 
fective for even or odd numbers of particles and has no difficulty with 'open shell' 
systems; and (e) can compute multiple excited states with the same quantum 
numbers. The main disadvantage of CI is it is not size-extensive and so contains 
a large number of unlinked diagrams that must be canceled [l| ; this leads to a 
slow convergence as the dimension of the model space increases. This means CI 
bases often have very large dimensions, and one often turns to an effective inter- 
action that partially sums over many single-particle states. Other many-body 
methods have different sets of advantages and disadvantages, of course. 

Configuration-interaction calculations expand the many-body wavefunction 
in an orthonormal basis, 

i*> = £<*,!<*>. C 1 ) 

a. 

Subsequently the central computational goal of a CI calculation is to find eigen- 
values and eigenvectors of a very large matrix H a p = (a\H\(3), that is, to solve 

Hc„ = E n c n . (2) 

Because one is generally interested only in low-lying states, typically the lowest 
5-20 states, one can use Arnoldi methods such as the Lanczos algorithm fl2, 13|. 
where one iteratively transforms the Hamiltonian to tridiagonal form: 

H|u„) = 6„_i|v„_i) + a n \v n ) + b n \v n+ i). (3) 

This creates a unitary transformation to a new basis. The advantage of Lanczos 



over other unitary transformations to tridiagonal form such as Householder 12 [ 



is that one does not need to complete the transformation. If we truncate after 
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n — 1 Lanczos iterations, leaving us with the approximate matrix 
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the extremal eigenvalues converge quickly 1 131 . which one can understand through 
the lens of the classical moments problem [14j. The downside of Lanczos is that, 
due to numerical round-off error the Lanczos vectors \v n ) lose orthogonality and 
must be forcibly orthonormalized, which is why Householder is often preferred 
when one must completely transform a matrix to tridiagonal form. 

The important point to take away is that matrix- vector multiplication is the 
fundamental operation. In large cases, in the so-called M-scheme (described 
below), the dimensions can be upwards of 10 7-10 ; the associated matrix is very 
sparse, roughly 10~ 4 -10~ 6 for two-body Hamiltonians. Specific examples for 
nuclei are given in Table I, demonstrating that storage of just nonzero matrix 
elements requires hundreds of gigabytes, terabytes, and even petabytes; if one 
uses 3-body interactions, discussed in Section l3~5l and illustrated in Tabled the 
matrices are significantly less sparse and the storage demands even higher. In 
fact, one can argue that it is not the dimensionality of the CI vector as in Eq. ([T]) 
but the number of nonzero matrix elements that governs the computational 
difficulty of a CI calculation-a completely dense matrix of dimension 10 6 has 
much higher demand on memory than a matrix of dimension 10 s but with 
sparsity 10~ 6 . 

There are two approaches to the problem of a large, sparse matrix. First, 
one can simply store the nonzero matrix elements. In nuclear physics the CI 
code of Whitehead et al. UM and later the OXBASH code UM and its successor 



NuShell [16[ stored the matrix elements on disk, but for modern computers 
reading data from disk dramatically slows down the algorithm. Alternately, one 
can store the matrix elements in RAM, which is much faster but for all but the 
most modest of problems requires distribution across hundreds or thousands of 
nodes on a parallel computer, as done with the MFDn code 17 1. 

Not only does storage of the many-body Hamiltonian matrix elements put 
an enormous drain on memory resources, it is wasteful: the non-zero matrix ele- 
ments are not unique but have an enormous redundancy, each one reused many 
times over, as matrix elements between pairs of particles are generally unique, 
but in a many-body system there are a large number of possible combinations 
of inert spectators. 

An alternate to storage of the many-body matrix elements is to recreate 
them on the fly, which by reducing redundancy requires one or two orders of 
magnitude less memory. On-the-fly recalculation can be made surprisingly ef- 
ficient by factorizing the problem into complementary parts, using quantum 
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Table 1: Some model spaces for atomic nuclei and their M-scheme (M = 0) dimensions and the 
sparsity for a two-body Hamiltonian. "Storage" refers to memory requirements in gigabytes 
for the nonzero matrix elements. The model spaces are described in detail in Appendix B; 
JVsheii includes all configurations, while iV max is a truncation on the non-interacting energy. 



Nuclide 


space 


basis 


sparsity 


storage 






dim. 




(GB) 


^ 8 Si 


sd 


9.4 x 10 4 


6 x 10" 3 


0.2 


52p e 


pf 


1.1 x 10 8 


1 x 1CT 5 


720 


56 Ni 


pf 


1.1 x 10 9 


2 x 1CT 6 


9600 


4 Hc 


N a be\l = 8 


2.9 x 10 7 


4 x 10~ 4 


1440 


4 Hc 


iVshell = 10 


2.7 x 10 8 


1 x 1CT 4 


36,000 


4 He 


iVmax = 16 


6.3 x 10 6 


1 x 10~ 3 


200 


4 Hc 


^max = 22 


8.6 x 10 7 


3 x 1CT 4 


9600 


12 C 


N s he\\ = 3 


8.2 x 10 7 


1 x 1CT 5 


400 


12 C 


N shc n = 4 


5.9 x 10 11 


8 x 10~ 9 


1 x 10 7 


12 C 


N max = 8 


5.9 x 10 8 


4 x 1CT 6 


5200 


13 C 


-/Vmax = 6 


3.8 x 10 7 


4 x 10~ 5 


210 



numbers. (Quantum numbers, which label irreducible representations of sym- 
metry groups, are generally found as eigenvalues of commuting operators; for a 
review see |Appendix A] Note that although we focus on continuous symmetries, 
that is rotation, factorization will work for any abelian symmetry, for which one 
can compute the quantum number for a many-body state by simply adding or 
multiplying the quantum numbers of the single-particle states. Therefore the 
methods described herein could be carried over to discrete symmetries, not only 
parity but also point-group symmetries as long as there is an abelian subgroup 
such as the cyclic group of order n, C n .) In the next section, we describe how 
one can, for a broad class of cases, efficiently factorize the many-body basis, 
and, in the following section, we discuss factorization in applying the Hamilto- 
nian matrix. These factorization methods were used in several major CI codes, 
ANTOINE [H, NATHAN 0, NuShellX [H], and our own unpublished codes 
REDSTICK and BIGSTICK. Methods similar to factorization have been used 
in quantum chemistry (see Ref. 13, U| and references therein), which allowed 
quantum chemists to reach dimensions of over a billion (but again: the com- 
putational difficulty is not only the dimensionality of the vectors but also the 
number of nonzero matrix elements) . Factorization has also been used in nuclear 



structure physics as a gateway to approximation schemes [20|, |2l|, |22|, [23 1 . 

While the basic ideas are outlined in Ref. [IH, in the following two 
sections we present in detail how factorization work, both for the basis and for 
the Hamiltonian, and show explicitly how it reduces memory load. In Section 
[4] we give a new application that further exploits this approach and forms 
the heart of our latest CI code, BIGSTICK. Lastly, we discuss parallclization 
of the algorithm and its performance. Most of our examples are taken from 
atomic nuclei, but in |Appendix C| wc also give similar numbers for the electronic 
structure of atoms. 
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A note about conventions. We use lower-case Greek letters, typically a, (3 to 
label many-body basis states, which as explained below will be Slater determi- 
nants. When we restrict a Slater determinant to a single species of particle (e.g., 
protons or neutrons, or spin- up or spin-down electrons), we typically use v 
and add a suffix, e.g. (i p or for a generic, unspecified species we use x and y. 
Single-particle states we label with i,j,k,l. Quantum numbers associated with 
single-particle states are denoted by lower-case letters, e.g. j, m, ir, while for 
many-body states we use capital letters J, M, and II. 



2. Factorization of the basis 

The concept of factorization can be seen most easily in an efficient represen- 
tation of the many-body basis. In order to exploit factorization we must have 
two (or more) species of fermions, for example, protons and neutrons, or spin-up 
and spin-down electrons (as done sometimes in quantum chemistry 1^, or 



two spin-species of cold atoms. For generality we label these species x and y. 
Then any wavefunction can be expanded in a sum of product wavef unctions, 

|*) = ^c^|/i x )K) (5) 

where we already see the basis states factorized. 

Like good physicists, we build our many-body state \Hx),\vy) from simple 
components, and start with a finite set of orthonormal single-particle states 
{4>i}. For our purposes, the single-particle states must have as good quantum 
numbers total angular momentum, j, and z-component of angular momentum, 
m. While for the nuclear case these are assumed to take on half-integer val- 
ues, the algorithm can be trivially generalized to integer values, for example, 
if one has spin-up and spin-down electrons as separate species, as described in 
Appendix | Appendix C| One generally also assumes good parity ir. 

Thus, one can imagine the single-particle states as cigenstates of a rota- 
tionally invariant single-particle Hamiltonian. The Hamiltonian that generates 
these single-particle states is fictitious and is chosen for convenience; in the limit 
of an infinite space, the final result will not depend on the single-particle states. 
For finite spaces, of course, the choice of single-particle states can critically af- 
fect convergence, and in nuclear physics one often chooses harmonic oscillator or 
mean-field single-particle states. The radial dependence of each <pi only enters 
into the numerical values of the interaction matrix elements, which are evalu- 
ated externally and read in as a file. All we need to know for each state 4>i are 
the quantum numbers j% , rrii and optionally iti . We assume for a given ji all 
possible wii are allowed. 

To illustrate, consider a specific example from the structure of atomic nuclei. 
The sd valence space contains the lsi/2, Od^/2 and Od.5/2 orbits the quantum 
numbers of which are given in Table [21 Throughout the rest of this paper we 
will give examples built upon this single particle space. An alternate set of 
examples from the electronic structure of atoms can be found in |Appcndix C| 
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Tabic 2: Ordered list of single-particle states in the sai-shell for atomic nuclei. 



State 


I 


j 


m 3 


1 





1/2 


-1/2 


2 





1/2 


+ 1/2 


3 


2 


3/2 


-3/2 


4 


2 


3/2 


-1/2 


5 


2 


3/2 


+ 1/2 


6 


2 


3/2 


+3/2 


7 


2 


5/2 


-5/2 


8 


2 


5/2 


-3/2 


9 


2 


5/2 


-1/2 


10 


2 


5/2 


+ 1/2 


11 


2 


5/2 


+3/2 


12 


2 


5/2 


+5/2 



Now that we have the single-particle states, we construct the many-body 
states, also known as Slater determinants, using antisymmeterized products of 
single-particle states. A coordinate-space Slater determinant for n particles is 
written as 



W2) 



4>2(r 2 ) 



4>i{r n ) 4>2(r n ) 



<i>n{r 2 ) 



(6) 



A Slater determinant can, however, be compactly represented using second 
quantization 0, [24j ■ Let d\ be the creation operator associated with the ith 
state 4>i(r); then the occupation- or number-space representation of a Slater 
determinant of n particles is given by 



a a 



1"2 



.4|0) 



(7) 



where |0) is the vacuum state (or an inert/frozen core). Different Slater deter- 
minants will have different combinations of {4>i} and thus use different combina- 
tions of creation operator {a^}. So, for example, drawing up the single-particle 
states in Tabled some possible five-particle states might be ajdJagaj^slO)! 
a\alala\a,Q\0) , aJjajjaiofilia^lO): and so on. The ordering is important but 
only insofar as a different ordering can lead to a phase due to antisymmetry: 



a\a\ 



-a\a\, etc.; while we will not emphasize the phase, getting the phase 



right is critical. 

The Pauli exclusion principle means each single-particle state can be oc- 
cupied by at most one particle. This is particularly convenient for comput- 
ers, as a bit-representation of the occupation of single-particle states is natural 
13|. Thus our example Slater determinants become, respectively, 111110000000, 
111101000000, and 011000000111. Again, the ordering is arbitrary but must be 
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fixed consistently in order to determine the phase (often one starts from the 
right rather than from the left as we have-the only important thing is consis- 
tency keeping track of the ordering) . The sets of bits can be simply stored as 
integers, and manipulation of the individual bits is straightforward if somewhat 
tedious |13j |. An additional convenience is that if the {<j)i\ form an orthonormal 
set, the resulting Slater determinants will also be orthonormal. For 5 protons 

/ 12 \ 

in the sd valence space (Tabled, there are a total of _ = 792 possible 



Slater determinants. 

Not every combination is needed, however, as explained in the next two 
sections. 

2.1. The M -scheme basis 

We invoke the critical assumption that the many-body Hamiltonian is rota- 
tionally invariant, so that both total J 2 and total J z commute with the Hamil- 
tonian and the eigenstates of the Hamiltonian have both J and M as good 
quantum numbers, respectively. This in turn means that the Hamiltonian will 
not connect many-body states with different M. Therefore we choose an M- 
scheme basis, that is the many-body basis states all have the same M . The 



original Whitehead code [13(, ANTOINE [18|, and MFDn [17J are all M-scheme 
codes. This is convenient because m is an additive quantum number: to deter- 
mine the M for a Slater determinant, one just has to add the m, of the occupied 
single-particle states. 

One can create a J- scheme basis, where the basis states also have fixed J; 
the many-body Hamiltonian is block-diagonal in J. In atomic physics, these are 
called configuration-state functions (CSFs). But this is less straightforward, as 
each basis state must be represented as a linear combination of M-scheme basis 
states. The J-scheme basis has both advantages and disadvantages, which we 
will not discuss here. Examples of J-scheme codes in nuclear physics include 
OXBASH [H] and its successors NuShell and NuShellX and NATHAN 0; 
OXBASH and NuShell store the many-body Hamiltonian matrix elements on 
disk, while the latter two utilize factorization, but a discussion of factorization 
with a J-scheme basis is not the intent of this paper. 

Using the single-particle states in Table[2j the state aja^dgajaj; |0) has M = 
—3/2, the state aja^aga^aglO) has M = —1/2, and the state a^aJajga^o^lO) 
has M = +7/2. While the total number of five-particle states in this valence 
space is 792, there are 119 states with M = 1/2, 104 with M = 3/2, 80 with 
M = 5/2, and 51 with M = 7/2, 28 with M = 9/2, 11 with M = 11/2, and 
3 with M = 13/2 ( the same number apply for M = —1/2,-3/2, etc.) For 6 

f 12 \ 

neutrons in the sd valence space, there are a total of „ = 924 possible 



6 

many-body states, but if we restrict ourselves to M = there are only 142 
states. 

As we assume rotational invariance, eigenstates should have good J and 
M, and the eigenvalue can only depend upon J and not M. The M-scheme 
eliminates the rotational degeneracy and reduces the size of the basis. 
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These simple examples are for a single species of particles. With two species 
of particles, the many-body basis becomes more complex, but factorization al- 
lows a compact representation, as we discuss next. 

2.2. Factorization of the M -scheme basis and basis 'sectors' 

In the expansion defined by Eq. both the z-species state \/j, x ) and the 
y-state \ v y ) are represented by Slater determinants. Now we can begin to see the 
usefulness of factorization. One could represent each final basis state by a single 
Slater determinant, by simply combining the bit-strings, but this is inefficient, 
because in general any given x-species state \fx x ) can be combined with more 
than one y-state \v y ) in constructing the basis. This means one can construct 
the many-body basis from a small number of components. We will give more 
detailed examples below, but let us consider the case of the 27 A1 nucleus, using 
the sd valence space. This assumes five valence protons and six valence neutrons 
above a frozen 16 core. The total dimension of the many-body space is 80,115, 
but this is constructed using only 792 five-proton states and 923 six-neutron 
states. 

The reader will note that 792 x 923 = 731016 > 80115. Indeed, not every 
five-proton state can be combined with every six-neutron state. The restric- 
tion is due to conserving certain additive quantum numbers, and this restric- 
tion turns out to limit usefully the nonzero matrix elements of the many-body 
Hamiltonian, which we will discuss more in the next section. 

For our example, we chose total M = +1/2 (though we could have chosen 
a different half-integer value). This basis requires that M p + M n = M; and 
for some given M p , every proton Slater determinant with that M p combines 
with every neutron Slater determinant with M„ = M — M p . This is illustrated 
in Table El which shows how the many-body basis is constructed from 792 
proton Slater determinants and 923 neutron Slater determinants. Note we are 
"missing" a proton Slater determinant; the lone M p = —7 state has no matching 
(or 'conjugate') neutron Slater determinants. 

As a point of terminology, we divide up the basis (and thus any wavefunction 
vectors) into sectors, each of which is labeled by M p , and any additional quan- 
tum numbers such as parity n p ; that is, all the basis states constructed with 
the same M p (n p , etc.) belong to the same basis 'sector' and have contiguous 
indices. Basis sectors are also useful for grouping operations of the Hamiltonian, 
as described below, and can be the basis for distributing vectors across many 
processors, although because sectors are of different sizes this creates nontrivial 
issues for load balancing. 

This factorization is further illustrated in Fig. [T] we can think of the many- 
body basis being formed by rectangles, the sides of which are sectors of the 
proton and neutron Slater determinants, organized by M p and M n . Again, one 
can generalize this to other additive/multiplicative quantum numbers such as 
parity: in multi-shell calculations the total parity is fixed with II = YL p x II n . 

The factorization leads to an impressive compactification of the basis. We 
could explicitly write down each of the 80,115 basis states in terms of their 
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Table 3: Decomposition of the M-scheme basis for 5 protons and 6 neutrons in the sd valence 
space ( 27 A1), with total M = M p + M n + 1/2. Here "pSD" = proton Slater determinant 
and "nSD" = neutron Slater determinant, while "combined" refers to the combined pro- 
ton+neutron many-body basis states. The subset of the basis labeled by fixed M p (and thus 
fixed M n ) we label a 'sector' of the basis. 



M p 


# pSDs 


M n 


# nSDs 


# combined 


+ 13/2 


3 


-6 


9 


27 


+ 11/2 


11 


-5 


21 


231 


+9/2 


28 


-4 


47 


1316 


+7/2 


51 


-3 


76 


3876 


+5/2 


80 


-2 


109 


8720 


+3/2 


104 


-1 


128 


13,312 


+1/2 


119 





142 


16,898 


-1/2 


119 


+ 1 


128 


15,232 


-3/2 


104 


+2 


109 


11,336 


-5/2 


80 


+3 


76 


6080 


-7/2 


51 


+4 


47 


2444 


-9/2 


28 


+5 


21 


588 


-11/2 


11 


+6 


9 


99 


-13/2 


3 


+7 


1 


3 


Total 


792 




923 


80,115 



component proton and neutron Slater determinants, that is, 

R = K)K). (8) 

Here we index the M-scheme basis by a = 1, ... , 80115, while the proton Slater 
determinants are indexed by fi p = 1, . . . , 792 and the neutron Slater determi- 
nants by v n = 1, . . . , 923. It is straightforward to construct index functions 
f p J n such that [SIS 

a = fp(fJ-p) + fn{Vn) (9) 

This is an example of factorization. Instead of storing explicitly each and 
every basis state, one only needs the much smaller set of proton and neutron 
Slater determinants, and the indexing functions to map to the combined many- 
body basis. Table S] gives the number of component proton and neutron Slater 
determinants for a number of representative cases. 

One can also introduce some useful truncations of the many-body basis, also 
based upon additive weights that act like quantum numbers. In order not to 
muddy the waters, we give a description of a specific scheme in | Appendix D 

Now that we have introduced the concept of factorization for the basis, we 
turn to its usage for the interaction. 



M =+9/2 

. i' 



proton Slater determinants 



■ 

II 

B 



M =+5/2 

p 



Figure 1: Illustration of factorization of the M-schemc many-body basis. Along the x-axis 
we have proton Slater determinants ordered by M p , while along the y-axis we have neutron 
Slater determinants also ordered by M n (but in reverse order) . Any proton Slater determinant 
with given M p will combine with any and all neutron Slater determinants with conjugate 
M n = M — M p . Each block therefore represents a sector of the basis in our terminology. This 
example is for M = 1/2, taken from the text. 



Table 4: Factorization of the M-scheme basis (M = for even A, or 1/2 for odd A) for selected 
atomic nuclei in terms of the number of proton and neutron Slater determinants needed. The 
model spaces are described in Appendix B. 



Nuclide 


space 




basis 


proton 


neutron 


# basis 








dim. 


SDs 


SDs 


sectors 




sd 




9.4 x 10 4 


924 


924 


15 


56p e 


pf 




5.0 x 10 s 


38760 


184,722 


27 


56 Ni 


pf 




1.1 x 10 9 


1.2 x 10 5 


1.2 x 10 5 


29 


6 He 


N s be\l = 


6 


1.4 x 10 9 


6216 


6 x 10 6 


42 


4 He 


NsbeQ = 


10 


2.7 x 10 s 


96,580 


96,580 


74 


9 Li 


^max 


10 


3.5 x 10 s 


1.5 x 10 5 


1.4 x 10 7 


126 


9 Be 


max — 


10 


5.7 x 10 8 


1.1 x 10 6 


5.1 x 10 6 


136 


4 He 


-^V max — 


22 


8.6 x 10 7 


3 x 10 5 


3 x 10 5 


307 


14 C 


^Vshcll = 


3 


2.6 x 10 s 


4 x 10 4 


1 x 10 5 


34 


12 C 


N s hcU = 


4 


5.9 x 10 11 


4 x 10 6 


4 x 10 6 


58 


U C 


A^max = 


8 


5.9 x 10 8 


5 x 10 6 


5 x 10 6 


105 
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3. The Hamiltonian and its factorization 



In second quantization, a Hamiltonian takes the form 



// X /'<•">••• 7>>< 



ijkiaiO-jO-io-k 



(10) 



ij ijkl 



where T,j are the one-body Hamiltonian matrix elements, which may include 
kinetic energy and some external or mean-field potential, and Viju > the two- 
body matrix elements; as the latter is the primary computational challenge, we 
ignore the one-body part. One can also add three-body terms, etc., but the 
underlying principals arc unchanged. 

The matrix elements are not uncorrelated, due to hcrmiticity, fermion an- 
tisymmetry, and, most germane to this discussion, rotational invariance. The 
details can be found in standard monographs, e.g. [2j . For our purposes, we only 
care about selection rules which arise from additive or multiplicative quantum 
numbers. What this means is, for example, the requirement that, unless 



then Vijki = 0. This is an example of a conservation law/selection rule. A 
similar section rule exists for parity. Below we discuss how we can exploit this 
for two-species systems, but first, we discuss why the M-schcmc Hamiltonian 
matrix is so sparse, and, furthermore, why the non-zero matrix elements have 
enormous redundancy. 

3.1. Sparsity and redundancy of the matrix elements 

Table I illustrated the sparsity of M-scheme Hamiltonian matrices. This can 
be understood as arising because we restrict ourselves to a two-body Hamilto- 
nian. For the novice we expand upon this idea here. 

As explained above, one can represent the basis Slater determinants as binary 
numbers, with a '1' representing an occupied state and a '0' an occupied state. 
The action of a destruction operator ai is to replace a '1' in the zth position 
with a '0', while a creation operator a\ does the opposite: 



(The phases arise because of fermion anticommutation relations: every time 
an annihilation operator anticommutes past a creation operator, we pick up a 
minus sign.) Trying to create a particle where there already is one gives a zero 
amplitude; this is the Pauli exclusion principle for fermions. Conversely, trying 
to destroy a particle where none exists also gives a zero amplitude. 

Thus, the action of the operator ajajajafc is to destroy particles in states 
k and I and put particles in states i and j. To simplify, let's assume i,j,k, 



m,i + m, —mk—mi 



o 4 |01011100) = -101001100); 
4|01001100) = -101011100); 
a 4 |01001100) = 0; a||01011100> = 0. 



(11) 
(12) 

(13) 
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Table 5: Some model spaces for atomic nuclei and their M-scheme (M = 0) dimensions and 
the average redundancy of the Hamiltonian matrix elements (m.e.s), defined as the ratio of 
the number of nonzero matrix elements (excluding Hermitian conjugates) to the number of 
unique matrix elements. The model spaces are described in Appendix B. 



Nuclide 


space 


basis 


^ nonzero 


# unique 


average 






dim 


m.e.s 


m.e.s 


redundancy 


^ 8 Si 


sd 


9.4 x 10 4 


2.6 x 10 y 


2800 


9300 


52p e 


pf 


1.1 x 10 s 


8.9 x 10 10 


2.2 x 10 4 


4 x 10 6 


56 Ni 


pf 


1.1 x 10 9 


1.2 x 10 12 


2.2 x 10 4 


5 x 10 7 


4 Hc 


^Vshcll = 8 


2.9 x 10 7 


1.8 x 10 11 


1.4 x 10 8 


1300 


4 He 


Mnax = 16 


6 x 10 6 


2.5 x 10 10 


1.5 x 10 9 


18 


12 C 


A^shcll = 3 


8.2 x 10 7 


5.2 x 10 10 


1.6 x 10 4 


3 x 10 6 


12 C 


"max = 8 


6 x 10 8 


6.4 x 10 11 


5 x 10 7 


1 x 10 4 



and I are all distinct. Then the amplitude of this operator between two Slater 
determinants yU and /i' , (^'|a|a^a;afe|/j,) is zero unless: 

• The states i, j are occupied in |^) and unoccupied in |//) ; 

• The states k, I are occupied in and unoccupied in |/it); and 

• all other particles in and occupy identical states; these are spectators 
and do not change as one goes from [i to //. 

As one might imagine, these stringent conditions are difficult to meet; hence, 
the sparsity found in Table I. 

Now the story is not done. Not only is the Hamiltonian matrix sparse, the 
nonzero matrix elements are furthermore highly redundant. In the action of the 
Hamiltonian, the operator djajaj&fc has a numerical amplitude Vijki- Now this 
operator will be able to connect many dozens of different pairs of Slater deter- 
minants, leading to many dozens of different many-body Hamiltonian matrix 
elements. But only the spectators differ, while the value of the matrix element 
V^ki, up to some phase, remains the same. 

This leads to large redundancies, as illustrated in TableO from factors span- 
ning from 20 to 10 6 . In a two-species system, one can devise a factorization 
algorithm to take partial advantage of this redundancy to reduce the memory 
load, as described in the following subsections. 

3.2. The Hamiltonian with two species 

With two species, the Hamiltonian can be written as a sum of different parts 

H = H X + Hy + H XX + Hyy + H X y . (l4) 

Here H x is a one-body operator, i.e., kinetic energy plus external potential, 
that acts only on species x, H xx is a two-body operator which denotes the 
interaction between two particles of species x, and H xy is a two-body operator 
that denotes the interaction between one particle of species x and one particle 
of species y; the generalizations to species y, H y and H yy , are analogous. The 
one-body terms are easy to work with, so, henceforth, we focus exclusively on 
the two-body interactions. 
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We write 

6 ** = \ E V^a\{x)iL){x)ai{x)a k {x). (15) 

ijkl 

The analogous operator for species y 7 H yy has a similar form. The cross-species 
interaction, that is interaction between a particle of species x and a particle of 
species y, is 

which more broadly can be expanded in a factorized fashion: 

K = Y, v Si 6 ^ 6 f ( 17 ) 

ijkl 

where, for example, 

6$ =al(xy aj (x). (18) 

Because the model space is finite all such expansions are also finite. 

Now, we can sketch out the Hamiltonian matrix element (again, considering 
only the two-body interactions) 

KK/^|if|Mx)k) = {li' x \H xx \h x )8 v , v + {v' y \H yv \v y )6 Wll + (19) 

a b 

This allows us to begin to see the route to factorization and its efficiency. To 
begin with, the component matrix elements, (fi' x \H xx \(ji x } , (/i x \Oa\fi x ), etc., 
are much smaller in number than the full matrix elements, as conservation 
laws/selection rules dramatically restrict the number and coupling of these com- 
ponents. 

We assume the Hamiltonian to be rotationally invariant, which means that 
the Hamiltonian is an angular momentum scalar. (Notice that we also assume 
the number of particles of each species is conserved, a condition that could be 
relaxed although it would lead to additional complications.) Because the Hamil- 
tonian commutes not only with J 2 but also J z , this means that the eigenvalue 
of J z , or M, is conserved. Earlier, we discussed how this allowed us to invoke 
a fixed-M basis, which is easy to construct using Slater determinants. But now 
we go further: because the interaction cannot change M, in Eqs. (fT5|) and p^l) 
we have 

mi + rrij — mu — mi = 0. (20) 

Similarly for parity, 

7T ? ; X TTj X 7Tfc X 7Tf = +1. (21) 

The conservation of these quantum numbers dramatically restricts the number 
and the coupling of the matrix elements, as we will now lay out. 
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proton Slater determinants 



loop over neutron states - 



proton two-body jump 

i 1 

i i 

I M =+5/2 1 



Figure 2: Illustration of factorization of H vv for the example in the text. Because the conjugate 
neutron Slater determinant does not change, the 'two-body jump' cannot change either M n 
or M p . Instead we loop over all (proton) two-body jumps and also loop over all conjugate 
(neutron) Slater determinants. 



3.3. Two-body jumps 

Consider H xx , the interaction between two particles of species x\ H yy works 
exactly the same. As described above, 

(^K^I^Imx)!^) = 04I#x*Ia*»)<w (22) 

What does this mean for computing the matrix element of H xx l Because v = v' 
for the y-basis component, H xx cannot change M y or Ily, which in turn means 
that it cannot change M x or H x . Figure [2] gives a schematic version. 

Thus, although we need the (fj/ x \H xx \fj, x ), we only need them diagonal in 
quantum numbers such as M x , which enormously reduces the amount of in- 
formation. The (fi' x \H xx \fj, x ) we term 'two-body jumps.' Each two-body jump 
consists of the following information: 

• the initial x-state label fi; 

• the final x-state label //; 

• and the value of the matrix element (/j,' x \H xx \fj, x ) (which includes a phase ±1 
from fermion anticommutation). 

These 'jumps' can be stored as simple arrays, along with indexing informa- 
tion that tells us the start and stop points for two-body jumps for a given M x 
etc. Because the jumps are at the heart of the factorization algorithm, they are 
best kept as simple arrays; storing within derived types in modern Fortran, for 
example, can degrade performance by a factor of two. 

It's helpful to remind ourselves of what we learned about the factorization of 
the basis in the previous section. Any x-basis state \fx x ) has quantum numbers 
M x (/j,) etc, and in the total basis this state can and must be coupled to any 
and all y-basis states \v y ) with conjugate quantum numbers. The application 
of H xx is given in this pseudo-code: 
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Loop over x-sectors (that is, sectors of the x-states with fixed quantum 
numbers M x , H x ); 

For this sector, loop over all the x-species two-body jumps; 
this determines the initial and final x-state labels [i and [i! as 
well as the matrix element; 

For this sector, also loop over all conjugate y-states 
v. From this we can determine the initial state on = 
fx(Px) + fy{v y ) and final state f3 f = f x (fj,' x ) + f y {v v ) 
in the basis. The matvec operation is now 

VfWf) -> v f (Pf) + Vi(ai) x (fi'jHxM (23) 

Because of hermiticity, we can also swap Q!j and (3f 
to get at the same time 

Vf(ati) -> Vf(ai) + Vi(pf) x (fi' x \H xx \fi x ) (24) 



Note that these loops depend only on the quantum numbers M, II (and possi- 
bly the weight factor W for many-body truncations, as discussed in |Appcndix D[ ) 
and are otherwise independent of each 2-body jump. In a purely serial code 
looking up the jump information is time consuming, and so looping over the 
spectator y-states is best as the innermost loop, but for example when using 
OpenMP or other shared-memory parallel schemes, we use the loop over the 
spectator y-states the outer loop in order to avoid data collisions. 

To illustrate this algorithm, we return to the example given in Section 12. 21 
and in particular in Table [3l The proton- proton or xx part of the Hamiltonian 
cannot change M n and so also cannot change M p . Now let's consider the specific 
case of the sector of the basis with M p = +5/2 and M n = —2. There are a 
total of 8720 basis states in this sector, so there are possibly 8720 2 = 68 x 10 6 
matrix elements. However, H pp cannot change the neutron Slater determinant, 
so that, as in Eq. (|22|) . v = v' . 

Now in this sector, there are 80 proton Slater determinants with M p = 
5/2, so that in principle there could be 80 2 = 6400 matrix elements (//' \H pp \fx p ) 
(or half that if we take into account hermiticity). But we have a two-body 
interaction and five particles, so that for each nonzero matrix clement there 
are always three static spectators. This reduces the number of nonzero matrix 
elements of (fi' p \H pp \fi p ) to 2677. 

With the two-body jumps in hand, we can easily reconstruct the full matrix 
element, Eq. (|22|) . for this sector of the basis. We loop over all 2677 two-proton 
jumps, which takes us from one proton Slater determinant to another \(j/ p ), 
including the value of the matrix element. We also have a fast, inner loop over 
the 109 neutron Slater determinants \v v ) which does not change. (Note: if we 
parallelize our code with memory sharing, e.g. with OpenMP, it is useful to 
either make the loop over v y the outer loop or to order the jumps based on the 
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Table 6: Number of one- and two-body 'jumps' and storage requirements for representative 
atomic nuclei in different model spaces (described in Appendix B). For storage of nonzero 
matrix elements (penultimate column) we assume each many-body matrix element is stored 
by a 4-byte real number and its location encoded by a single 4-byte integer. Storage of a 
single jump (initial and final Slater determinant for a species, and matrix element and phase) 
requires 13 bytes. All storage (final two columns) are in gigabytes (GB). 



Nuclide 


space 


basis 


# 1-body 


# 2-body 


Store 


Store 






dim 


jumps 


jumps 


m.e.s 


jumps 


28 Si 


sd 


9.4 x 10 4 


4.8 x 10 4 


7.6 x 10 3 


0.2 


0.002 


52 Fe 


pf 


1.1 x 10 8 


4.0 x 10 6 


8.5 x 10 6 


700 


0.16 


56 Ni 


pf 


1.1 x 10 9 


1.5 x 10 7 


4.0 x 10 7 


9800 


0.6 


4 He 


^Vshcll = 8 


2.9 x 10 7 


1.4 x 10 7 


2.9 x 10 7 


1500 


0.6 


4 He 


N — 22 


9 x 10 7 


5.3 x 10 s 


4.7 x 10 9 


9300 


69 


12 C 


^Vshcll = 3 


8.2 x 10 7 


4 x 10 6 


7 x 10 6 


400 


0.14 


u c 


Nsholl = 4 


6 x 10 11 


8 x 10 8 


2.4 x 10 9 


10 7 


43 


12 C 


Mnax = 8 


6 x 10 8 


6 x 10 8 


3 x 10 9 


5200 


45 


13 C 


N max = 6 


3.8 x 10 7 


7 x 10 7 


3 x 10 8 


210 


4.3 



index of the final state and sort on thread boundaries in order to avoid data 
collisions.) We invoke the straightforward indexing of the basis, Eq. © to obtain 
the indices of the initial and final basis states. End result: out of a possible 
68 million matrix elements in this sector, we use just 2677 two-proton jumps 
to find the 2677 x 109 = 291793 nonzero matrix elements (a sparsity of 0.4%). 
(Furthermore, there are only 350 unique values of the matrix elements.) This 
illustrates how factorization can compress, without approximation, an already 
very sparse matrix into a relatively small amount of memory. 

Comparison between storage scheme and factorization is given in Table [5] 
and discussed below in Subsection 13.61 

3.4- One-body jumps 

The action of H xy is more complicated: it moves one particle of species x 
and one particle of species y. Nonetheless, factorization is still viable, and, in 
fact, it greatly reduces the memory storage requirements. 

Using Eq. (jTT)) . the matrix element of H xy is 

Wy\{^\H Xy \^)\Uy) =Y J VaMd^ x ){ V ' y \6^\u y ) (25) 
ab 

We call (fi' x \0^ \(i x ) and {v' v \0^ \v y ) one-body jumps for species x and y, 
respectively. Like two-body jumps, for each one-body jump we must store the 
labels of the initial and final Slater determinants plus the label a of the one-body 
operator O a . 

Because neither the x nor the y Slater determinants are spectators, individ- 
ual quantum numbers such as M x and M y are no longer constant. On the other 
hand, because the total M = M x + M y is constant, we know that M y must 
change in an equal and opposite manner to the change in M x . 
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proton Slater determinants 
proton one-body jump 



neutron one-body jump 



Figure 3: Illustration of factorization of H pn using one-body jumps, for the example in the 
text. The change in M p must be matched by the change in M n . 



So, going back to our example from the previous subsection (Table [3]), if we 
start in the sector M p = +5/2, M n = —2, if the proton one-body jump takes 
us to M p = — 1/2 then we must have a conjugate neutron one-body jump that 
takes us to M n = +1. Figure [3] gives a schematic version. 

For H xx , we looped over the 2-body jumps of species x and then had a 
simple loop over the allowed conjugate states of species y dictated by quantum 
numbers. For H xy , we must first identify the initial and final M-scheme sectors, 
and loop over all the associated one-body x-jumps and all the associated one- 
body y-jumps. The pseudocode is: 
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Loop over all possible initial and final M x , thus also fixing the initial 
and final M y ; 

For the initial and final M x , loop over all the associated one- 
body jumps of species x; this will yield the initial and final 
ir-Slater determinant labels /i, \i' as well as the label a of the 
onc-x operator O a \ 

For the initial and final M y , loop over all the associ- 
ated one-body jumps of species y; this will yield the 
initial and final y-Slater determinant labels is, v 1 as 
well as the label b of the one-y operator Ob', from this 
we can determine the labels of the initial and final 
states, 

Oi = Ufa) + fv("),0f = fM + fy("') 

as well as the the actual value of the matrix element 
V a b] this is somewhat nontrivial to reconstruct. Fi- 
nally we get the matvec update 

Vf(Pf)^Vfifif)+Vi{<*i)V*b (26) 
and can again swap a.i and /3f via hermiticity. 

Going from the subspace (or sector) M p — 5/2, M n — —2 (dimension 8720) 
to the sector M p = —1/2, M„ = 1 (dimension 15,232) the number of possible 
matrix elements is 8720 x 15, 232 = 1.3 x 10 8 ; but there are 337 proton jumps 
and 419 neutron jumps and a total of 337 x 419 = 132, 823 nonzero matrix 
elements, with a sparsity of 0.1%. 

Again, comparison between storage scheme and factorization is given in Ta- 
ble [5] and discussed below in Subsection 13.61 

3.5. Three-body forces 

One can generalize in a straightforward manner the previous algorithm to 
three-body forces. Here, one now has H xxx , H xxy , H xyyi and H yyy . The first and 
the last require 'three-body jumps,' while H xxy and H xyy require combining one- 
and two-body jumps. In Table[7J we compare, for three-body forces, the sparsity 
as well as the memory requirements to store the nonzero matrix elements and to 
store the jumps; these can be compared to data in Tables [1] and |U The matrices 
are, as expected, significantly less sparse, while we retain the significant memory 
efficiency by storing jumps rather than storing the nonzero matrix elements. 

3. 6. Comparison of memory requirements 

Using 'jumps' to store the information about the many-body Hamiltonian 
matrix elements is significantly more efficient than storing the matrix elements, 
as illustrated for two-body interactions in Table [6] the memory requirements 
are 50 to 100 times less for factorization, an enormous savings. Furthermore, 
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Table 7: Storage requirements for three-body interactions for representative atomic nuclei 
in different model spaces (described in Appendix B). The corresponding data for two-body 
interactions can be found in Tables [T] and [6] All storage (final two columns) are in gigabytes 
(GB). 



Nuclide 


space 


basis 


Sparsity 


Store 


Store 






dim 




m.e.s 


jumps 




sd 


9 x 10 4 


0.06 


2.1 


0.01 


52p e 


Pf 


1 x 10 8 


3 x 10- 4 


1.5 x 10 4 


1.6 


4 Hc 


^Vshcll = 8 


3 x 10 7 


0.03 


1.2 x 10 5 


11 


4 He 


N max = W 


2 x 10 6 


0.10 


2000 


16 


12 C 


^Vshcll = 3 


8 x 10 7 


3 x 10" 4 


8800 


1.4 


13 C 


-^v max — 6 


4 x 10 7 


1 x 10~ 3 


6200 


80 


12 C 


Af m ax = 8 


6 x 10 8 


1.5 x 10" 4 


2 x 10 5 


1100 



reconstruction of the matrix elements is very fast; in timing tests it is no more 
than a factor of two slower than simply fetching the matrix element from mem- 
ory, and that is for H xy ; for H xx / yy the timing between factorization and storage 
is similar. Table [7] makes the comparison for three-body forces, where the ratio 
is even larger. 

There are other advantages. Setup of the arrays is very fast, especially if 
one takes the method further as we describe in the next section. Because we 
can 'forecast' the number of operations without actually creating or directly 
counting the nonzero matrix elements, we can quickly obtain that information 
and distribute the work for parallel processing, as we describe in Section [5] 

The take-away lesson is: while the M-scheme Hamiltonian matrix is very 
sparse, one can dramatically reduce the storage requirements further by one or 
two orders of magnitude using factorization; in the case of three-body forces, the 
reduction can be as high as three orders of magnitude. With factorization one 
can run efficiently on a desktop machine problems that would otherwise require 
storing the many-body Hamiltonian on disk, with slow I/O, or distributing the 
matrix elements across many compute nodes on a distributed memory parallel 
machine. Alternately, on the same parallel machine one could tackle a much 
larger problem-50 or 100 times larger, even more in the case of three-body 
forces as discussed in Subsection 13.51 - than with a matrix storage scheme. 
Thus, despite the higher intrinsic complexity of the algorithm (and we note 
that leadership class codes using matrix storage are by no means 'simple'), 
factorization can, using the same computational resources, push the limits of 
calculations further. 

4. Factorization: the next step 

Inspired by ANTOINE [l~8[, our first attempt at on-the-fly CI code was 
REDSTICK (unpublished), initiated while two of us (CWJ and WEO) were at 
Louisiana State University in Baton Rouge, La. When we began to plan our next 
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generation code, BIGSTICK, we first looked at the computational bottlenecks 
in REDSTICK. They were, in decreasing order of importance: 

• Inefficient application of the jumps when making truncations on the many- 
body space (briefly: use of the W weighting index to truncate the many-body 
space, as described in |Appcndix D| further restricts application of the jumps, 
but this was not handled in an efficient manner); 

• Inefficient distribution of the matvec (matrix- vector multiply) operations across 
parallel computational nodes; 

• Inefficient generation of the basis, especially for large relative truncation based 
on weighting W; and 

• Inefficient generation of the one- and two- (and, for three-body interactions, 
three-) body jumps. Note that no other on-the-fiy CI code has 3-body capabil- 
ities; this was our major motivation for writing REDSTICK and BIGSTICK. 

The first bottleneck, inefficient application of the jumps, was addressed by 
organizing both the basis and the jumps by sectors, that is, labeling by M, II 
and W. The second bottleneck, inefficient parallelization, we address below in 
Section The final two bottlenecks we addressed by introducing a new level of 
factorization. 

As described above, the basis is factorized by multiplying together x- and 
y-species Slater determinants (e.g. proton and neutron Slater determinants, or 
up- and down-spin electron Slater determinants) that have conjugate quantum 
numbers. We have taken this strategy to the next level. Each Slater determi- 
nant for given species is itself written as a product of two 'half-Slaters,' one 
constructed from single-particle states with m < 0, and the other from single- 
particle states with m > 0. 

= U L) )U R) )- (27) 

Here 'L' denotes a 'left' half-Slater determinant (hSD) , constructed with single- 
particle states with m < 0. 'R' denotes a 'right' hSD constructed with single- 
particle states with m > 0J, and cprcscnts denotes the quantum num- 
bers associated with each hSD, notably M^'^, , W {L} ^ R \ and, cru- 
cially, the number of particles n (i) ' (fl:) . As usual, M x = M^+M^ R \ etc, while 
Tlx = + n^ R \ Any full basis state is now the product of four half-Slater 
determinants, e.g. \a) = \u: x L ^)\u}^}\uy L ^}\ujy R ^}. However this combining is 
only done implicitly and never explicitly. 

Crucially, and unlike the Slater determinants of species x and y, these half- 
Slater determinants do not have fixed particle number, which now acts as a new 
quantum number. While this adds a layer of complexity, it also offers several 
advantages. In the same way one constructs a very large total basis from much 
smaller lists of x- and y- Slater determinants, the number of required hSDs is 
much smaller still. For any given set of quantum numbers n, M, II (and W), 



1 We use m = for some atomic physics calculations, when the spin quantum number i 
is associated with the two species, and thus otherwise m is associated with orbital angular 
momentum L with integer values. 
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Tabic 8: Factorization of the M-scheme basis [M = 0) for selected atomic nuclei in terms of 
the number of 'half-Slater determinants'. 



Nuclide 


space 


basis 


proton 


proton 






dim. 


SDs 


hSDs 


^ 8 Si 


sd 


9.4 x 10 4 


924 


128 


52p e 


pf 


1.1 x 10 8 


38760 


1696 


56 Ni 


pf 


1.1 x 10 9 


1.2 x 10 5 


2026 


4 He 


A^shcii = 8 


2.9 x 10 7 


28,680 


1522 


4 He 


^ max — -22 


8.6 x 10 9 


3.3 x 10 5 


2.8 x 10 5 


12 C 


A^shcii = 4 


5.9 x 10 11 


4 x 10 6 


1 x 10 5 


12 C 


N max = 8 


5.9 x 10 8 


3 x 10 6 


6 x 10 5 



the list of half-Slaters is small. Examples are given in Table [8j one gets bigger 
savings for more particles and for full-configuration bases. 

Furthermore, the fact that the half-Slaters do not have fixed particle number 
becomes an advantage. Consider: we do not always need all possible Slater 
determinants of a given species, but merely all Slater determinants of a fixed 
set of quantum numbers. 

In particular, when one has large W many-body truncations, generating 
the required Slater determinants, is a nontrivial task. Generating all possible 
Slater determinants and eliminating those unneeded turns out to be horribly 
inefficient. For example, consider a W = 2 cut on 16 0. There are only 1245 
proton Slater determinants needed to construct the basis, but if one naively 
generated all Slater determinants in the single-particle space there would be 76 
million Slater determinants. 

The answer, of course, is instead of generating millions or billions of candi- 
date Slater determinants and throwing away the bulk of them, one creates them 
recursively: start by creating one-particle states, then two-particle states, and 
so on, and by constraining the additive quantum numbers one can arrive at the 
required basis. But even so, this is somewhat wasteful, as one throws away the 
intermediate basis sets. 

The half-Slaters answer this: not only do we construct the half-Slaters re- 
cursively (from the vacuum state we generate all the needed one-particle half- 
Slaters, and from the one-particles half-Slaters we generate all the two-particle 
half-Slaters, and so on), we actually need all or most of these hSDs in order to 
create the Slater determinants. Furthermore, this gives us a route to creating 
the 'jumps.' 

Let us explain in more detail. First, we break the single-particle space, for 
example that found in Table [5J The single-particle states 1, 3, 5, 7, 8, and 9, 
which all have rtij < 0, are grouped together (as 'left' or L states), while the 
remaining single particle states with mj > are group as 'right' or R states. 
Let's focus first on the left states. We start with the vacuum: 000000. From 
the vacuum state we create six one-particle hSDs: 100000, 010000, 001000, etc. 
From each of the one-particle hSDs we create two-particle hSDs; one can do this 
without duplication by only adding a particle to the right of all filled states, that 
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Table 9: Construction of the many-body matrix elements (for two-body interactions) from via 
creation-operator 'hops.' For three-body interactions, the number of jumps and many-body 
matrix elements will increase, but the number of hops is fixed. 



Nuclide 


space 


# nonzero 


_ # 


# 






many-body m.e.s 


jumps 


hops 


2y Si 


sd 


2.6 x 10 b 


1.2 x 10 b 


768 


52 Fe 


pf 


8.9 x 10 10 


1.2 x 10 7 


15280 


56 Ni 


pf 


1.2 x 10 12 


5.6 x 10 7 


20080 


4 He 


iVsheU = 8 


1.8 x 10 11 


4.3 x 10 7 


6 x 10 4 


4 He 


jVmax — 22 


1.1 x 10 12 


5.3 x 10 9 


1.5 x 10 6 


12 C 


ATshell = 3 


5 x 10 10 


1 x 10 7 


1.5 x 10 4 


12 C 


A^shcll = 4 


1.5 x 10 15 


3.3 x 10 9 


1.3 x 10 6 


12 C 


A^max = 8 


6.4 x 10 11 


3.4 x 10 9 


4.2 x 10 6 



is, 

001000 ^ 001100,001010,001001 (28) 

and so on, recursively If one is making a cut on the many-body basis using the 
W weighting, one simply does not create hSDs that would violate the maximum 
W allowed. 

Almost for free we have the action of creation operators on a half-Slater, 
that is, the matrix element 

(u'\a\\u;), (29) 

where n u i = n u + 1. We call such a matrix element a 'hop' and it comes out 
automatically when generating all the half-Slaters with n u + 1 particles from the 
half-Slaters with n w particles. In practice, we sort the hSDs, already grouped 
by the number of particles ra w , by M w , 11^, W w ; because the lists are small, the 
sorting is quick. Then when computing the 'hop' we know the initial and final 
quantum numbers and the search through the list of possible final states is also 
very quick. It is this kind of sorting and searching that is most time-consuming 
in occupation-space CI codes, and by breaking down the basis into half-Slaters 
we speed up the process considerably 

There are typically a small number of these 'hops' and they are, again, 
organized efficiently by quantum numbers; in fact, given the quantum numbers 
associated with uj and u/ the quantum number of the single-particle state i is 
fixed. Some illustrative data on hops is found in Table O 

From the hops one can then build one- and two-body jumps needed for the 
action of the Hamiltonian. The algorithm for doing so is slightly involved, but 
it is quick, because one has eliminated searches. For example, to generate all 
the jumps needed for 52 Fe in the pf shell, it takes less than 30 seconds on a 
standard desktop machine-and each matrix multiplication takes more than 30 
minutes. (For large W cuts, the efficiencies drop; the number of half-Slaters 
needed relative to the final basis dimension is larger, and the time to generate 
jumps is also longer.) 
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Figure 4: For 6 Li in iV max = 6 space (dimension = 197,822), number of matrix elements 
connecting to each basis state. 



Parceling the basis and the interaction also allows for rather fine control for 
parallel work, as discussed in detail in the next section. 



5. Parallelization and load balancing 

All the CI codes described in this paper follow Whitehead's pioneering use 
of the Lanczos algorithm to efficiently find low-lyin g eigen states[13j. (Another 
popular algorithm is Davidson-Liu and its variants [2 a . |26I | , designed primarily 
for diagonal-dominant matrices, a context appropriate for atomic physics but 
not nuclear.) The central operation in the Lanczos algorithm is a matrix-vector 
multiplication (matvee). Since for a M-scheme basis the Hamiltonian many- 
body matrix is very sparse, the key to efficient parallelization and load balancing 
of the matvee operation is to distribute as evenly as possible the non-zero matrix 
elements across the compute nodes. 

In our first-generation code, REDSTICK, we distributed the matvee oper- 
ations by initial basis state. Each basis state, however, has vastly different 
numbers of operations connection to it. Fig. 2] illustrates this for the case of 6 Li 
in an N max = 6 space (two-body interactions only) ; Fig. [5] shows on a log graph 
the same information but sorted. Thus, naively distributing work by basis state 
lead to unbalanced workloads. Sorting as in Fig. [5] will not help to balance 
workloads as "nearby" initial states connect to very different final states. 

The partial grouping can be understood as related to organizing the basis 
into sectors labeled by quantum numbers of the proton Slater determinants. 
When storing matrix elements on core, one can average this out through a 



randomizing, 'round- robin' algorithm [27j . Such a method will not work when 



using quantum numbers to factorizc the Hamiltonian, however. 

While investigating the distribution of the matvee operations, we realized 
that factorization itself gives us the key to parallelization. As explained in 
section III, the Hamiltonian is a sum of one- and two-body terms. Since it is 
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Figure 5: The same as Fig. [4] but sorted on the number of connecting matrix elements.. 

easy to work with the one-body terms, here we focus on the parallelization of 
the application of the two-body terms, i.e., H XXl H yy , and H xy . The action of 
these operators is organized in one- and two-body jumps. The key to efficient 
parallelization of BIGSTICK is to distribute the jumps across parallel cores in 
such a way that each compute core performs (approximately) equal number of 
operations as described below. (A 'core' is either a MPI process, in a pure MPI 
distributed memory programming model, or a thread, in an OpenMP shared 
memory, or hybrid MPI+OpcnMP, programming model.) 

Rather than counting up operations individually to/from each state, we 
break up the Hamiltonian into blocks based upon quantum numbers, that is, 
we organize Hamiltonian operations by the initial and final sectors of the basis 
(where a 'sector' of the basis is labeled by the quantum number M X ,U X ,W X of 
the z-species). In this way, we can compute the number of operations without 
actually counting them individually. In the example discussed in Section III.C, 
we know from 2677 proton two-body jumps and 109 conjugate neutron Slater 
determinants, that we get 2677 x 109 = 291, 793 operations. It is these 'blocks' 
of operations, based upon quantum numbers, that we break up and distribute 
across compute cores, with fairly modest bookkeeping. 

For instance, continuing the same example, suppose we find we want ap- 
proximately 100,000 operations per compute core. For the block of operations 
in our example, we can have proton two-body jumps number 1 through 917 on 
the first compute core, jumps 918 through 1834 on the second compute core, 
and 1835 through 2677 on the third compute core. Each core loops over all 109 
conjugate neutron Slater determinants. 

We find this distribution of the matrix- vector multiplication scales very well. 
Fig. [5] demonstrates scaling from 64 to 4096 cores for the case of 50 Mn in the pf 
shell (basis dimension of 18 million) with a three-body force; the data was taken 
on the Jaguar supercomputer at Oak Ridge National Laboratory in September 
2012. We have done a number of similar studies; for example, with an earlier 
version of BIGSTICK we found similar scaling of the matvec operation from 
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Figure 6: (Color online) Relative speed-up of the matvec operation for 50 Mn in the p/-shell, 
with an M-schcme dimension of 1.8 X 10 7 with three-body interactions. Solid (black) line 
corresponds to the ideal speed-up and broken (red) line to the actual speed-up. Data from 
Jaguar at Oak Ridge National Laboratory in September 2012. 



500 to 10000 cores on the Franklin machine at the National Energy Research 
Computing Center (NERSC) for 52 Fe in the pf shell (dimension 110 million) 
with a two-body interaction. 

At this point the major computation bottlenecks are communication and, 
if one writes the Lanczos vectors to disk, I/O (alternately, one can store the 
Lanczos vectors also across many compute nodes^3l)- All parallel CI codes 
face these same barriers. 



6. Summary and Conclusion 

Configuration- interaction calculations of the many-fermion problem are straight- 
forward: one diagonalizes the Hamiltonian in a many-body basis. If that basis 
is made of Slater determinants with fixed J z , the resulting matrix is very sparse; 
despite the sparsity, however, the number of nonzero matrix elements quickly 
becomes overwhelming (cf. Table [1]). Investigation of the nonzero matrix ele- 
ments reveals a high degree of redundancy, that is, the same numerical value of 
the the matrix element, up to a phase, over and over again (cf. Table [5]). 

Both the sparsity and the redundancy can be understood as the consequence 
of having a two- or three-body interaction embedded in a many-body space, 
with a consequence of many spectator particles. Nonetheless, while the sparsity 
reduces the number of many-body matrix elements, simply determining the 
nonzero matrix clcmcnts-often of the order of just one in a million-is a nontrivial 
problem. 

That very sparsity, however, can be turned to an advantage. In this paper, 
we discussed how, if one has two species of particles and if there are abclian 
quantum numbers (which simply means the quantum numbers are additive or 
multiplicative), one can further compactify the problem via factorization. Fac- 
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torization yields a memory savings of one to almost three orders of magnitude, 
as shown in Tables [6171 furthermore, it can make planning of the parallel distri- 
bution of work straightforward, as one can calculate the number of operations 
without explicitly constructing them. While factorization has been used in nu- 
clear physics codes for well over a decade, recent work taking factorization an 
additional step leads to additional efficiencies. There is a cost to all this, of 
course, in more complicated internal bookkeeping. It is worth pursuing, how- 
ever, as it will allow us to tackle significantly larger problems, than a straight 
matrix storage scheme, on the same computing platform. 

The U.S. Department of Energy supported this investigation through con- 
tracts DE-FG02-96ER40985 and DE-FC02-09ER41587, and through subcon- 
tract B576152 by Lawrence Livermore National Laboratory under contract DE- 
AC52-07NA27344. We would like to thank H. Nam, J. Vary, and P. Maris for 
helpful conversations over the years regarding the development of CI codes. 



Appendix A. Quantum numbers 

In this section, we briefly introduce the concept of quantum numbers for 
non-physicists (e.g., computer scientists and applied mathematicians). 

Quantum numbers label irreducible representations of a group. In quantum 
mechanics, the labeling is done using eigenvalues of commuting operators (28l |: 
often these eigenvalues in turn correspond to physically conserved quantities, 
especially if the group is continuous. 

In classical mechanics, Noether's theorem states that if a Hamiltonian is in- 
variant under an operation, there will be a corresponding conserved quantity. 
Invariance under spatial translation leads to conservation of linear momentum, 
while invariance under spatial rotation leads to conservation of angular momen- 
tum. 

In quantum mechanics, any observable quantity that can be measured is 
represented by a Hermitian, linear operator O; the allowed values that can be 
measured are the eigenvalues of the operator, 

0|tf) = A|#). (A.l) 

Invariance in quantum mechanics is given by commutation: if [H, O] = 0, then 
one can have simultaneous eigenstatcs: 

H\E,X) = E\E,X), G\E,X) = X\E,X). (A.2) 

In this case, A corresponds to a conserved quantity: this value not does change 
under the evolution of the wavefunction through the time-dependent Schodingcr 
equation. For many important conserved quantities, such as total angular mo- 
mentum 

J 2 |*)=j(j + l)?i 2 |tf) (A.3) 
and the z-componcnt of angular momentum 

J Z \V) = mh\^>) (A.4) 
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the eigenvalues can be expressed in simple numbers: here j, m are either integers 
or half-integers. (The convention in this paper is we use lower case letters 
j,m, etc., for single-particle quantum numbers, and capital letters J, M, etc., 
for quantum numbers of many-body systems.) This is the origin of the term 
'quantum numbers,' which in physics refers to eigenvalues that are exactly or 
approximately conserved. 

Conserved quantities lead to selection rules, which mean that certain matrix 
elements must be zero. For example, with a rotationally invariant Hamiltonian, 
total M is conserved, and hence there can be no matrix elements between ba- 
sis states of different total M. Physically, M is related to orientation, which 
for a system governed by a rotationally invariant Hamiltonian cannot change 
spontaneously. 

For a composite system, one needs group theory to combine quantum num- 
bers. Important to us is the fact that some quantum numbers can be combined 
simply: J z is added, and parity is multiplied. (Technically, this is because they 
are represented by abelian groups.) Others quantum numbers, such as total 
angular momentum, associated with non-abelian groups require more sophis- 
ticated tensor algebra. It is the simplicity of additive and/or multiplicative 
quantum numbers that we exploit here. It is also why the M-scheme basis, 
constructed from a simply additive (abelian) quantum number is numerically 
easy, and why the J-scheme basis, while more compact, is also more challenging, 
constructed as it is from more complicated non-abelian quantum numbers. 



Appendix B. Model space for nuclei 

For the nonexpert, we describe here several typical model spaces for CI 
calculations of low-energy nuclear structure. When doing ab initio or no-core 
calculations, one uses harmonic oscillator single-particle orbits Os, Op, Is, Od, and 
so on, where the first number is n, the number of nodes in the radial wavefunc- 
tion. Because spin-orbit splitting is large in nuclei, one also appends total j: 
s i/2i Opi/2) 0^3/2! etc These orbits are grouped together into shells (some- 
times called major shells) by the principal quantum number N = 2n + l. Hence 
Osi/2 has N = 0, 0px/2) O7J3/2 have N = 1, lsi/2 ; 0d 3 / 2 , 0d 5 / 2 have N = 2 and so 
on. Below, in |Appcndix D we describe various truncation schemes, the most 



important for these no-core calculations are either truncating solely on the num- 
ber of shells, so that iV s hcii = 3 include the N = 0, 1, 2 shells; or, the iV m ax or 
NhCl truncation scheme. 

The latter is described in more detail in | Appendix D[ but briefly: to each 
many-body state one assigns an energy which is the sum of the non-interacting 
harmonic oscillator energies, leaving off the zero-point energy of |M2 per par- 
ticle. In that case, one includes only states up to a certain cutoff in this energy 
above the lowest possible energy. For example, consider 4 He. If all four nucle- 
ons are in the 0si/ 2 orbit, the non-interacting energy is Ohfl. (Here hfl simply 
provides the energy scale but the value of fi does not affect the truncation.) If 
one excites 1 particle from the Osx/2 orbit to, say, the 2si/ 2 or the ld 5 / 2 or the 
0<79/2 orbits (all in the N = 4 or 2s-ld-0g major shell), then the state has a 
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non- interacting energy of Ahfl. One can achieve the same value by exciting two 
particles up to the TV = 2 major shell, or all four up to the TV = 1 or Op shell, 
or 1 particle to TV = 1 and another up to TV = 3. We call the set of all states 
with non-interacting energy equal to or less than ATiSl the TV max = 4 space. It 
gets more complicated for more particles. For example, for 12 C the lowest non- 
interacting energy is in fact 8h£l (4 particles in the TV = or Os shell and the 
rest in the TV = 1 or Op shell). An TV max = 4 excitation includes exciting one 
particle from the Os up to the 2s-ld-0g shell, or from the Op (TV = 1) up to the 
2p-lf-0h (TV = 5), or two particles from the Op up to the lp-0/, or 4 particles 
from the Op up to the ls-Od, etc. 

We also use as examples two valence model spaces. The first is the ls-Od 
shell, also known more simply as the s<i-shell. The valence space we choose to 
be the ls 1 / 2 , 0c? 3 / 2 and 0d 5 / 2 orbits, also known as the sd or IsOd space, the 
quantum numbers of which are given in Table [3J along with an inert, frozen 
16 core, that is, (0s 1 / 2 ) 4 (0pi/2) 4 (0p 3 / 2 ) 8 filled configuration. (Because of the 
strong spin-orbit force in nuclei, even in the simplest approximations one must 
consider not only I but also j of each single-particle state.) In addition, we 
consider the pf shell, which assumes an inert 40 Ca core and with an active 
valence space the lpi/2, lp3/2j O/3/2 and 0/ 5 / 2 orbits. Throughout the rest of 
this paper we will give examples built upon these spaces. 



Appendix C. Examples from atomic physics 

Our examples in the text were primarily taken from low-energy nuclear 
physics. In order to provide a different context, we present here some examples 
from atomic physics, electrons around a single atom and cold spin-1/2 atoms in 
a harmonic trap. In both cases we treat "spin-up" and "spin-down" particles as 
separate species (in place of protons and neutrons); thus the orbits are labeled 
by I rather than j. All fermionic properties are preserved, however. 

To begin, we demonstrate how to construct the many-body basis for elec- 
trons around an atom, using the small model space in Table IC.101 Wc assume 
a frozen Ne core, that is, frozen (IS) 2 (2S) 2 (2P) 6 , and have valence 3S, 3P, 
and 373 states. Consider specifically the case of neutral phosphorus, with five 
valence electrons. We take three 'up' electrons and two 'down' electrons: if the 
force lacks a spin-orbit component, then both total L and total S will be good 
quantum numbers. 

We can construct Slater determinants as we did for nuclear examples. So, 
using the number from Table IC.10| the Slater determinant for three spin-up 
particles |0), which can be represented in bit form as 010100001, has 

total M — L z = 2 and parity +. Because we assume the same orbits for 
spin-down, Slater determinants for spin-down electrons are similar. 

When we construct all the states with three spin-up, two spin-down, total 
L z = and total parity = +, as shown in Table [CTTTI there are 252 such states. 
/ 9 \ 

Note that we use all _ =36 of the spin-down Slater determinants, but we 
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Table C.10: Ordered list of single-particle states in the N = 3-shell for atomic electrons. 



State 


I 


mi 


1 








2 


1 


-1 


3 


1 





4 


1 


1 


5 


2 


-2 


6 


2 


-1 


7 


2 





8 


2 


1 


9 


2 


2 



are missing two of the I ^ J = 82; those are the states with Mf = ±4. 

We also consider the valence space composed of TV = 3,4, 5 hydrogen-like 
orbitals, not shown. The details of the orbitals, i.e. the radial dependence, 
is not crucial to our point here, which is to illustrate comparative memory 
requirements. 

Again, taking phosphorus with M w = + basis states, in the iV = 3 va- 
lence space we can construct 242 basis states from 118 Slater determinants, as 
illustration in Table [CTTTT which in turn can be constructed from 89 hSDs. For 
the equivalent basis in the N = 3, 4, 5 space, the basis dimension is 1.5 million, 
constructed from 20,000 Slater determinants and 6800 hSDs. 

Looking at the Hamiltonian matrices, for the N = 3 space the sparsity is 
about 20% and with a redundancy of 18, that is every unique matrix element 
is reused on average 18 times in the many-body Hamiltonian matrix. For the 
N = 3, 4, 5 space, the sparsity is 0.2% and the redundancy is 3400. 

The Hamiltonian matrix for the N = 3 space requires 12,500 operations, 
constructed from 1847 jumps, built from 153 hops; the memory requirements 
for the nonzero many-body matrix elements is 0.5 Mb, while storage of the jumps 
requires only 0.02 Mb. The Hamiltonian matrix for the N = 3,4, 5 space, 1.5 
billion operations, requires about 6 Gb of memory, while the corresponding 3.6 
million jumps require only 47 Mb, which in turn are built from only 19,000 hops. 

Appendix D. Further truncation of the many-body basis 

Often one wants to further truncate the many-body basis, either on the basis 
of physics or on simple computational efficiency. We will discuss two common 
truncation schemes, and then introduce a relatively simple and flexible gen- 
eralization that encompasses these cases. (Nonetheless the following is rather 
technical and casual readers can skim the following section without loss of com- 
prehension for the rest of the paper.) 

In all our considerations we truncate the many-body space based upon the 
single particle space, that is, upon single-particle quantum numbers. One could 
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Table C.ll: Decomposition of the M-scheme basis for 3 spin-up and 2 spin-down electrons, 
using the valence space of Table lC.lOl and assuming a frozen Ne core (hence phosphorus), with 
total L z = Mf + Mj,0 and total parity it = TTf X 7Tj,. Here "t SD" = Slater determinants com- 
posed of spin- up electrons and "J,SD" = Slater determinants composed of spin-down electrons, 
while "combined" refers to the combined many-body basis states. 





#TSDs 




# |SDs 


# combined 


3+ 


3 


-3+ 


1 


3 


3~ 


3 


-3- 


1 


3 


2+ 


4 


-2+ 


2 


8 


2- 


6 


-2" 


2 


12 


1+ 


8 


-1+ 


4 


32 


1- 


8 


-1" 


4 


32 


0+ 


8 


0+ 


4 


32 


CP 


10 


o- 


4 


40 


-1+ 


8 


1+ 


4 


32 


-1- 


8 


1- 


4 


32 


-2+ 


4 


2+ 


2 


8 


—2~ 


6 


2" 


2 


12 


-3+ 


3 


3+ 


1 


3 


-3- 


3 


3- 


1 


3 


Total 


82 




36 


252 



truncate based upon many-body quantum numbers, but that is beyond the 
scope of this paper and of our algorithms. 

The first kind of truncation is sometimes called a particle-hole truncation 
in nuclear physics; in atomic physics (and occasionally in nuclear physics), one 
uses the notation 'singles,' 'doubles,' 'triples,' etc. To understand this truncation 
scheme, begin by considering a space of single-particle state, illustrated in Figure 
ID. 71 The single-particle space can be partitioned into four parts. In the first 
part, labeled 'inert core', the states are all filled and remain filled. In the fourth 
and final part, labeled 'excluded,' no particles are allowed. Both the core and 
excluded parts of the single-particle space need not be considered explicitly, only 
implicitly. In some cases there is no core. 

More important are the second and third sections, labeled 'all valence' and 
'limited valence', respectively. The total number of particles in these combined 
sections is fixed at N v , and this is the valence or active space. 

The difference between the 'limited valence' and the 'all valence' spaces is 
that only some maximal number Ni < N v of particles are allowed in the 'limited 
valence' space. So, for example, suppose we have four valence particles, but only 
allow at most two particles into the 'limited valence' space. In this case the 'all 
valence' might contain four, three, or two particles, while the 'limited valence' 
space might have zero, one, or two particles. In more standard language, Ni — 1 
is called 'one-particle, one-hole' or 'singles', while TV; = 2 is called 'two-particle, 
two-hole' or 'doubles', and so on. There arc no other restrictions aside from 
global restrictions on quantum numbers such as parity and M. 
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In nuclear structure physics, where center-of-mass considerations weigh heav- 
ily, one sometimes invoke a weighted refinement of this scheme. For all but the 
lightest systems, one must work in the laboratory frame, that is, the wavcfunc- 
tion is a function of laboratory coordinates, "J = ^(ri, r%, . . .). It is only the 
relative degrees of freedom that are relevant, however, so ideally one would like 
to be able to factorize this into relative and center-of-mass motion: 

*(ri,r 2 ,r 3 , . . .) = *rd(ri - r 2 ,ri - F 3 , . . .) x *cm(-Rcm) (D.l) 

(note that we have only sketched this factorization). In a harmonic oscilla- 
tor basis and with a translationally invariant interaction, one can achieve this 



factorization exactly, if the many-body basis is truncated as follows [30|, |3jJ, |32j : 

• In the non-interacting harmonic oscillator, each single-particle state has 
an energy = HQ(Ni +3/2). Here iVj is the principle quantum number, which 
is for the Os shell, 1 for the Op shell, 2 for the ls-Od shell, and so on. The 
frequency 17 of the harmonic oscillator is a parameter but its actual value plays 
no role here. 

• We can then assign to each many-body state a noninteracting energy 
Eni = &i, the sum of the individual non- interacting energies of each particle. 
There will be some minimum E m i n and all subsequent non-interacting energies 
will come in steps of 7il7-in fact for states of the same parity, in steps of 2fril. 

• Now choose some N max , and allow only states with noninteracting energy 
En i < -Emin + N max hCl. In practice, restricting states to the same parity means 
that the 'normal' parity will have Eni = -Emm, -Emm + 27117, E m - m + 4M7, . . ., 
-Emin + -^max7il7, while 'abnormal' parity will have Eni = -E m i n + 7il7, i? m i n + 
37L17, . . ., E min + N max %fL. 

This is sometimes call the Nhfl truncation, or simply the energy truncation. 
It is more complicated than the previous 'particle-hole' truncation. We identify 
with each principle quantum number TV^ a major shell; for a 47il7 we can excite 
four particles each up one shell, one particle up four shells, two particles each 
up two shells, one particle up one shell and another up three shells, and so on. 
While complicated, such a truncation allows us to guarantee the center-of-mass 
wavcfunction is a simple Gaussian. 

Both truncation schemes can be described by introducing an additional ad- 
ditive quantum number, which we call the weighting w for single-particle states 
and W for many-body states. Assign to each single-particle state (j>i a non- 
negative integer Wi\ for example, this might be the principal quantum number 
for the spherical harmonic oscillator. We assume that states of a given single- 
particle level (that is, labeled by unique ji,7Ti,(Xi but having distinct m,) share 
the same Wi. The weighting is additive, so that, like M, the W for a given 
many-body state is simply the sum of the WiS of the occupied states. 

Now, the truncation is simply defined by: allow all states with W < W max - 
(Usually Wmax is defined, as in the NhCt truncation, relative to some H^min-) 
To regain the simple 'particle-hole' truncation, we assign w — to the 'all 
valence' single-particle states and w = 1 to the 'limited valence' states, and set 
Wmax = ^Vj. One can devise, however, more complicated weightings. Because 
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Figure D.7: Segregation of single-particle space. 'Inert core' has all states filled. 'Excluded' 
disallows any occupied states. 'All valence' can have states up to the number of valence 
particles filled, while 'Limited valence' can only have fewer states filled (e.g. one, two, three...). 
See text for discussion. 

of the inequality and not a strict equality, W is not a 'good quantum number.' 
However we can use nearly the same machinery to implement the inequality as 
the equality. 

This weighted truncation can be and has been implemented into our CI code. 
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